Search CORE

4,166 research outputs found

Characterizing stages of a multi-session complex search task through direct and indirect query modifications

Author: Bron M.
He J. (Jiyin)
Vries A.P. (Arjen) de
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/07/2013
Field of study

CWI's Institutional Repository

Explaining query modifications: an alternative interpretation of term addition and removal

Author: He J. (Jiyin)
Hollink V. (Vera)
Vries A.P. (Arjen) de
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2012
Field of study

CWI's Institutional Repository

Combining implicit and explicit topic representations for result diversification

Author: He J. (Jiyin)
Hollink V. (Vera)
Vries A.P. (Arjen) de
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2012
Field of study

Result diversification deals with ambiguous or multi-faceted queries by providing documents that cover as many subtopics of a query as possible. Various approaches to subtopic modeling have been proposed. Subtopics have been extracted internally, e.g., from retrieved documents, and externally, e.g., from Web resources such as query logs. Internally modeled subtopics are often implicitly represented, e.g., as latent topics, while externally modeled subtopics are often explicitly represented, e.g., as reformulated queries. We propose a framework that: i) combines both implicitly and explicitly represented subtopics; and ii) allows flexible combination of multiple external resources in a transparent and unified manner. Specifically, we use a random walk based approach to estimate the similarities of the explicit subtopics mined from a number of heterogeneous resources: click logs, anchor text, and web n-grams. We then use these similarities to regularize the latent topics extracted from the top-ranked documents, i.e., the internal (implicit) subtopics. Empirical results show that regularization with explicit subtopics extracted from the right resource leads to improved diversification results, indicating that the proposed regularization with (explicit) external resources forms better (implicit) topic models. Click logs and anchor text are shown to be more effective resources than web n-grams under current experimental settings. Combining resources does not always lead to better results, but achieves a robust performance. This robustness is important for two reasons: it cannot be predicted which resources will be most effective for a given query, and it is not yet known how to reliably determine the optimal model parameters for building implicit topic models

CiteSeerX

CWI's Institutional Repository

Do you need experts in the crowd? A case study in image annotation for marine biology

Author: He J. (Jiyin)
Ossenbruggen J.R. (Jacco) van
Vries A.P. (Arjen) de
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2013
Field of study

Labeled data is a prerequisite for successfully applying machine learning techniques to a wide range of problems. Recently, crowd-sourcing has shown to provide effective solutions to many labeling tasks. However, tasks in specialist domains are difficult to map to Human Intelligence Tasks (or HITs) that can be solved adequately by "the crowd". The question addressed in this paper is whether these specialist tasks can be cast in such a way, that accurate results can still be obtained through crowd-sourcing. We study a case where the goal is to identify fish species in images extracted from videos taken by underwater cameras, a task that typically requires profound domain knowledge in marine biology and hence would be difficult, if not impossible, for the crowd. We show that by carefully converting the recognition task to a visual similarity comparison task, the crowd achieves agreement with the experts comparable to the agreement achieved among experts. Further, non-expert users can learn and improve their performance during the labeling process, e.g., from the system feedback

CWI's Institutional Repository

Explore the Tribological Effects of Two N-Containing Functional Groups on O/W Emulsion

Author: de Vries Erik
He Zhongyi
van der Heide Emile
Wu Yinglei
Publication venue
Publication date: 01/06/2022
Field of study

University of Twente Research Information

Studying User Browsing Behavior Through Gamified Search Tasks

Author: Azzopardi L.
Bron M.
He J. (Jiyin)
Vries A.P. (Arjen) de
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Crossref

CWI's Institutional Repository

Artist popularity: do web and social music services agree?

Author: Bellogín Kouki A. (Alejandro)
He J. (Jiyin)
Vries A.P. (Arjen) de
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 01/01/2013
Field of study

Recommendations based on the most popular products in a catalogue is a common technique when information about users is scarce or absent. In this paper we explore different ways to measure popularity in the music domain; more specifically, we define four indices based on three social music services and on web clicks. Our study shows, first, that for most of the indices the popularity is a rather stable signal, since it barely changes over time; and second, that the ranking of popular artists is heavily dependent on the actual index used to measure the artist's popularity

CWI's Institutional Repository

Association for the Advancement of Artificial Intelligence: AAAI Publications

Temporal Feedback for Tweet Search with Non-Parametric Density Estimation

Author: Efron M.
He J. (Jiyin)
Lin J.J.P. (Jimmy)
Vries A.P. (Arjen) de
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/07/2014
Field of study

CWI's Institutional Repository

Cumulative Citation Recommendation: A Feature-aware Comparisons of Approaches

Author: Gebremeskel G.G. (Gebre)
He J. (Jiyin)
Lin J.J.P. (Jimmy)
Vries A.P. (Arjen) de
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2014
Field of study

In this work, we conduct a feature-aware comparison of approaches to Cumulative Citation Recommendation (CCR), a task that aims to filter and rank a stream of documents according to their relevance to entities in a knowledge base. We conducted experiments starting with a big feature set, identified a powerful subset and applied it to comparing classification and learning to rank algorithms. With few set of powerful features, we achieve better performance than the state-of-the-art. Surprisingly, our findings challenge the previously known preference of learning-to-rank over classification: in our study, the CCR performance of the classification approach outperforms that using learning-to-rank. This indicates that comparing two approaches is problematic due to the interplay between the approaches themselves and the feature sets one chooses to use

CWI's Institutional Repository

CWI at TREC 2012, KBA track and Session Track

Author: Araújo S. (Samur)
Boscarino C. (Corrado)
Gebremeskel G.G. (Gebre)
He J. (Jiyin)
Vries A.P. (Arjen) de
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/02/2013
Field of study

We participated in two tracks: Knowledge Base Acceleration (KBA) Track and Session Track. In the KBA track, we focused on experi- menting with different approaches as it is the first time the track is launched. We experimented with supervised and unsupervised re- trieval models. Our supervised approach models include language models and a string-learning system. Our unsupervised approaches include using: 1)DBpedia labels and 2) Google-Cross-Lingual Dic- tionary (GCLD). While the approach that uses GCLD targets the central and relvant bins, all the rest target the central bin. The GCLD and the string-learning system have outperformed the oth- ers in their respective targeted bins. The goal of the Session track submission is to evaluate whether and how a logic framework for representing user interactions with an IR system can be used for improving the approximation of the relevant term distribution that another system that is supposed to have access to the session infor- mation will then calculate. the documents in the stream corpora. Three out of the seven runs used a Hadoop cluster provide by Sara.nl to process the stream cor- pora. The other 4 runs used a federated access to the same corpora distributed among 7 workstations

CWI's Institutional Repository